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Abstract: The spike protein of the severe acute respiratory syndrome coronavirus (SARS-CoV) mediates 
cell fusion by binding to target cell surface receptors. This paper reports a simple method for dissecting the 
viral protein and for searching for foldable fragments in a random but systematic manner. The method in¬ 
volves digestion by DNase I to generate a pool of short DNA segments, followed by an additional step of re¬ 
assembly of these segments to produce a library of DNA fragments with random ends but controllable 
lengths. To rapidly screen for discrete folded polypeptide fragments, the reassembled gene fragments were 
further cloned into a vector as N-terminal fusions to a folding reporter gene which was a variant of green 
fluorescent protein. Two foldable fragments were identified for the SARS-CoV spike protein, which coincide 
with various anti-SARS peptides derived from the hepated repeat (HR) region 2 of the spike protein. The 
method should be applicable to other viral proteins to isolate antigen or vaccine candidates, thus providing 
an alternative to the full-length proteins (subunits) or linear short peptides. 
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Introduction 

Severe acute respiratory syndrome coronavirus 
(SARS-CoV), which is the causative agent of the 
atypical pneumonia, was first identified in the fall of 
2002 to be a previously unknown member of the 
family of coronaviruses [1] . The rapid transmission by 
means of aerosols and the high mortality rate (up to 
10%) make SARS a potential global threat. An at¬ 
tractive approach to interfere with SARS disease pro¬ 
gression focuses on one of the earliest infection proc¬ 
esses by blocking the fusion process that mediates the 
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delivery of the viral genome into the host cell. The 
spike protein (S protein) is a 180- to 200-kDa type-I 
transmembrane glycoprotein that is responsible for the 
initiation and propagation of infection by interacting 
with a cellular receptor to induce cell-to-cell fusion. 
Binding of the S protein to a specific soluble or cell 
surface glycoprotein receptor induces global changes 
in the conformation of S protein that displays a previ¬ 
ously hidden hydrophobic surface area which allows 
the virions to interact with the host cell membrane' 13 '. 
Earlier attempted expressions of this S protein failed to 
obtain soluble full-length polypeptides in Escherichia 
coli ( E. coli). Subsequent work aimed at identifying 
smaller but folded SARS-CoV spike fragments for use 
as possible antigen or vaccine candidates. The ap¬ 
proach involved digestion and reassembly of the target 
gene to generate a pool with smaller DNA fragments of 
random ends but controllable lengths which were 
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screened for foldable fragments using a green fluores¬ 
cent protein as a folding reporter^ 4,5 '. Two foldable 
fragments were identified, which coincide with various 
SARS peptides reported to have SARS neutralization 
activity' 641 . This dissection approach has the potential 
to be a generally applicable tool for producing foldable 
fragments of viral surface proteins that may provide 
discontinuous epitopes. These fragments should be 
easier to express in E. coli or other recombinant hosts. 

1 Materials and Methods 

1.1 pET30a-linker-GFP construction 

The green fluorescent protein (GFP) gene was ampli¬ 
fied from an in-house GFP-containing vector pET30a- 
hydA (Wang and Lin, unpublished result), which was 
in turn constructed from pQB-2' 9 ', then ligated into the 
pET30a(+) (Novagen) to yield the pET30a-linker-GFP. 

1.2 Fragment library construction 

The SARS-CoV spike gene was obtained from the Fiuada 
Beijing Genomics Institute. Fragmentation and re¬ 
assembly of the target gene were performed as described 
by Lorimer and Pastan 140 '. The reassembled DNA sample 
was then purified and phosphorylated with T4 polynu¬ 
cleotide kinase at 37°C for 30 min. The backbone vector 
pET30a-linker-GFP was digested with EcoR I, blunt- 
ended with T4 DNA polymerase in the presence of 0.1 
mmol/L each dNTP, and purified with a QIAgen R gel pu¬ 
rification kit to remove residual enzyme activity. The lin¬ 
earized and blunt-ended vector was then dephosphory- 
lated with shrimp alkaline phosphatase (SAP) followed 
by heat denaturation to deactivate the enzyme. The gene 
fragments and the backbone vector were ligated at 12°C 
overnight in the presence of 5% PEG8000 and then trans¬ 
formed into E.coli BL21(DE3) (Novagen) competent 
cells by electroporation. 

1.3 Screening of fragments 

Transformed E. coli BL21(DE3) cells were plated on 
Luria-Bertani (LB) medium supplemented with 50 
pg/mL kanamycin and grown overnight at 37°C, then 
grown further on a bench for about 20 h. The fluores¬ 
cent colonies were picked and tested with colony PCR 
by using primers flanking the fragment inserts, and se¬ 
quenced. No isopropylthio-P-D-galactoside (IPTG) 


was used in these experiments, as it would inhibit the 
formation of fluorescent colonies. 

1.4 Expression analysis of fusion proteins 

Saturated overnight cultures were diluted 100-fold into 
LB medium containing 50 pg/mL kanamycin and 
grown at 37°C for about 2 h to reach an optical density 
at 600 nm (OD 60 o) of 0.5-0.6. Protein expression was 
initiated with 0.2 mmol/L of IPTG, and continued for 4 
h at 23 °C. Cells were then collected and lysed for solu¬ 
ble protein extraction. The supernatant fractions (solu¬ 
ble protein) and cell pellets (insoluble protein) were re¬ 
solved by SDS-PAGE using a 12% acrylamide gel. 

2 Results and Discussion 

This work sought to identify smaller but folded SARS- 
CoV spike fragments for use as possible antigen or 
vaccine candidates. Compared with linear short pep¬ 
tides derived from the protein, folded fragments may 
be advantageous as they have the potential to provide 
discontinuous epitopes. The SARS-CoV spike gene 
was digested by DNase I to generate a pool of short 
DNA segments, followed by an additional step of reas¬ 
sembly of these segments to produce a library of DNA 
fragments with random ends' 1011 . This is in part 
analogous to the DNA shuffling protocol^ 12 ' 13 ', but the 
purpose here is not to produce full-length hybrids from 
a group of different parental genes, but to generate 
various smaller DNA fragments from a single template 
gene. The reassembly step following the DNase I 
treatment is necessary to prepare a large number of 
DNA sequences with controlled lengths, which was 
achieved by tailoring the number of PCR cycles used 
in the reassembly (see Methods). To screen for discrete 
folded polypeptide fragments, the reassembled gene 
fragments were further cloned into a vector as N- 
terminal fusions to a folding reporter gene which was a 
variant of the green fluorescent protein that exhibits 
strong fluorescence upon UV excitation^ 9 '. GFP has 
been shown to be an effective indicator for the fold- 
ability of the upstream polypeptide partner^ 4 ' 5 '. The 
vector construction is shown in Fig. 1. 

Among about 4300 clones screened, 230 clones 
were found to be fluorescent (see Fig. 2a). These 
clones were then subjected to rapid colony PCR 
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Nde I EcoR 1 Hind III 

CATATG|GGGAATTCTGCTGGCTCGAGTGCTGCTGGTTCTGGATCC| |AAGCTT 

Linker GFP 

Fig. 1 Expression construct (pET30a-linker-GFP). The sequence is flanked by the Nde I and Hind III sites which other¬ 
wise is identical to pET30a(+) (Novagen, Madison, WI). It contains a linker sequence GNSAGSSAAGSGS (boxed) up¬ 
stream of the GFP gene, and an internal EcoR I site (underlined) used for insertion of gene fragments. 


analysis. Many of the fluorescent clones were found to 
contain vectors with SARS spike gene fragments 
smaller than 100 base pairs (bp). In addition, some oth¬ 
ers (a total of 20) contained vectors with inserts in the 
reverse orientation or not in frame as indicated by se¬ 
quencing. The SDS-PAGE results showed that the pep¬ 
tides encoded by these gene inserts were degraded in the 
corresponding fusion proteins (data not shown). Finally, 
the two inserts larger than 150 nucleotides or 50 de¬ 
duced amino acid (aa) residues that were identified were 
ssPtu-15 (residues 1118-1175 of the original protein) 
and ssPtu-16 (residues 1129-1186). The expression of 
these fragments (in the GFP-fusion form) was further 
examined by SDS-PAGE. As shown in Figs. 2b and 2c, 
the fragments were partially soluble when expressed at 
23 °C. Higher temperatures significantly reduced the 
amount of soluble protein (data not shown). 

Widely disparate virus families have been shown to 
contain two hepated repeat (HR) regions, which play a 
critical role in viral fusion with the target cell [1415] . Of¬ 
ten, one N-tenninal HR region (HR1) is adjacent to the 
cell fusion peptide while a C-tenninal HR region (HR2) 
is close to the transmembrane anchor. The SARS 
fragment ssPtu-15 isolated in this work overlaps with 
the HR2 (residues 1147-1185) of the SARS-CoV spike 
protein [16] , while fragment ssPtu-16 contains the 
whole SARS HR2 (Fig. 3a). A hydrophobic cluster 
analysis^ 171 of these two fragments showed two signifi¬ 
cant hydrophobic clusters (Figs. 3b, 3c, and 3d). Pre¬ 
sumably, these clusters play a role in the stability and 
oligomeric specificity of the HR2 structure 118] . In addi¬ 
tion, compared with the wild-type sequence, both of 
the fragments contain a mutation at 1163 (K replaced 
by E). ssPtu-16 also contains a second mutation at 
1151 (from I to T), while ssPtu-15 contains a second 
mutation at 1157 (S substituted by Y). These mutations 
are likely a result from the fragment reassembly proc¬ 
ess. The mutation at 1163 seems to increase the helic- 
ity of the fragments. Interestingly, our more recent dis¬ 
section studies with other proteins rarely resulted in 
mutations. 



(a) Colonies obtained from inserting and expression of SARS- 
CoV spike gene fragments in pET30a-linker-GFP using E . coli 
BL21(DE3) as the host. 


GFP C ssPtu-15 ssPtu-16 



control (denoted as “C”). All the pictures were taken under UV 
irradiation. 



(c) Coomassie brilliant blue-stained 12% acrylamide SDS-PAGE 
using E. coli BL21(DE3) as control (denoted as “C”). Calculated 
molecular masses for GFP, ssPtu-15, and ssPtu-16 were 29.9 kDa, 
36.3 kDa, and 36.3 kDa. Corresponding band positions are indi¬ 
cated by arrows, “s” indicates supernatants of lysates and “in” 
denotes insoluble pellets of the lysates. M: protein marker, broad 
range (NEB), whose bands were 175, 83, 62, 48, 33, 25, and 17 
kDa, respectively. 

Fig. 2 Expression of fusion proteins 
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CP-l .-GINASYVNIQKEIDRLNBVAKNLNBSLIDLQBLGKYB- 

KR2 .GDlSaiNASV'.'NIQKBIDRLMXVAlCMLinMLIDliOBLG. 

GST-HR2 -38 .GDI8GIMAJWMIQKBIDRLMBVAK1ILKB8LIDLQBLC. 

GST-HR2-44 .- - - -.GDI SGINASY'.HIQKI IDRLNBVAKlILIIMLIDLQBLaiCTSOBI* 


SKR2 -2 1 F.KELDKYFFJIHTSr’DVDLCDISGINASYYNIQFE IDRLNKYAKMLNKSLZDLQELGF.YB 

SHR2-1 .BLDSPKBBLJDKYFKIIHTSPDVDLGDXSGINASWMIQKBIDRLJIRVAKNLNBSLIDLQBLCKYB 

SHR2-8 .BLDSFF.EELDFY KF.NHTS I DYD EGDISGINASYYNIQKE IDRGNEYAKNLNSSLIDLQBLGKYEQYIK 

SKR2-9 .BLDSFKBB LDKYr KNOT3I DVDLODISGIMASVVNIQKBIDRLNBVAKMLNBSLIDLQ1L. 

BsPtu- 16 .SFKHBLDKYFKlIHTSPDVDLGDTSGIHASVVlilQEBIDRLNEVAKMLNBSLIDLQlLG. 

ooPtu-15 TVYDPLQPBLDSFKBBLDKYFKMHTSPDVDLGDISGIMAYWMIQEBIDRLMBVAiaiL. 

rul«t 1.10.20.30.40.50.60.70. 
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(a) CLUSTALW alignment of ssPtu-15 and ssPtu-16 with HR2 derived peptides which interfere with SARS-CoV S-mediated fusion to host 
cells: peptide CP-1 [7] , peptides HR2, GST-HR2-38, GST-HR2-44 [8] , peptides SHR2-1, SHR2-2, SHR2-8, and SHR2-9 [6] . 


ssPtu-15 TVYDPLQPELDSFKEELDKYFKNHTSPDVDLGDISGINAYVVNIQEEIDRLNEVAKNL 
EE-HHHHHHHHHHHH-EEEEEHHHHHHHHHHHHHH- 


ssPtu-16 SFKEELDKYFKNHTSPDVDLGDTSGINASWNIQEEIDRLNEVAKNLNESLIDLQELG 
-HHHHHHHHHHH-HHH—EEEE-HHHHHHHHHHHHHHHHHHH—HHHH- 

(b) Secondary structure prediction for ssPtu-15 and ssPtu-16 by 3D-Jury (http://bioinfo.pl/Meta/) [19] . E, p-strand; H, a-helix. 


10 20 30 40 50 10 20 30 40 50 

I I I I I I I I I I 




(c) and (d) Hydrophobic clusters analysis (HCA) plot for ssPtu-15 and ssPtu-16 drawn using DRAWHCA (http://bioserv.rpbs.jussieu.fr) [17] . Pro¬ 
tein sequences are displayed on a duplicated helix using one-letter codes for the amino acids except for prolines (★), glycines (♦), threonine (□), 
and serine (0). Hydrophobic residues are automatically contoured. 

Fig. 3 Sequence analysis 


Several studies [6 " 8] have reported that SARS-CoV S- 
mediated fusion can be inhibited by HR2 but not HR1- 
derived peptides, most likely by interfering with the 
six-helix bundle formation, a process essential to drive 
the membrane fusion reaction and to initiate infec- 
tion [l4] . For the majority of these peptides, micromolar 
concentrations were required for efficient inhibition of 
the viral infection, indicating that although these pep¬ 
tides are effective, further optimization is required to 
achieve efficient inhibition of SARS-CoV in infected 
individuals. Given the high similarity of ssPtu-15 and 
ssPtu-16 with these peptides derived from the FIR2 re- 
gion [6 ‘ 8] , ssPtu-15 and ssPtu-16 may both have poten¬ 
tial as therapeutic agents for the direct inhibition of 
SARS-CoV cell entry, as an anti-SARS vaccine, and as 
a high throughput assay for screening for small mole¬ 
cule inhibitors of SARS envelope-mediated cell fusion. 

In summary, the dissection approach described in 
this study has the potential to produce foldable 
fragments of viral surface proteins that may be use¬ 
ful for the design of antiviral compounds and provide 


alternative antigen or vaccine candidates. The method 
is target protein independent and thus can be applied to 
various viral proteins. The process is also simple and 
rapid. The method should be applicable for dissecting 
and understanding other non-viral proteins, for exam¬ 
ple, to identify smaller polypeptide units that are struc¬ 
turally, functionally, or evolutionally relevant. 
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